On Multi-Level Machines for Continuous Speech Recognition

نویسنده

  • Joseph Di Martino
چکیده

In this paper we introduce the concepts of multi-level machines and of multi-level dynamic programung. These machines are well suited for the difficult continuous speech recognition problem because firstly, they permit the integration of several knowledge sources, secondly, they allow an optimal search based on dynamic programmng and thirdly, they can deal with semantic constraints. Furthermore these semantic constraints have the interesting property to be dynamic, i.e. they can be modified easily by the speech recognition system itself. The important consequence of this property is that the multi-level machines presented in this paper have a potential self-leaminy hability. In this paper we present the concepts of multi-level machines and multi-level dynamic programming. These muulti-level machines, on one hand, permit the integration of several knowledge sources such as phonology, syntax, semantic etc, and on the other-hand can take into acount local semantic constraints. As it will be shown these semantic constraints are "dynamic", in the sense that the speech recognition system can modify them easily. In section 2 we begin by showing how a 1-level machine can be built from simple finite-state automatons. These automatons are called cells because the entire machine is built from these elementary machines. The description is done in such a way that the iterative process to generate a general n-level machine can be induced easily. The semantic links are also describes and we put in evidence the fact that firstly, they can be modified easily, and secondly that such modifications can be realized by the speech recognition system itself. This interesting property confer to the system a self-learning hability. In section 3 we introduce the formalism necessary to explain how an optimal seauch can be realized in an n-level machine. From this mathematical discussion we show that the solution found is optimal for all the levels of the machine.This other intersting property confirms the power of the n-level machines. The basic of the machine is a simple finite-state machine as illustrated by figure 1 (1). FIGURE 1. The basic cell of a multi-level machine : a simple finite state automaton. The basic cell is characterized by a set of starting states ES, a set of ending states FS, and a set of terminal symbols T. For example, in the case of figure and An element of the starting state set and an element of the ending state set are particularized : they are called respectively the …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

A Comparative Study of Gender and Age Classification in Speech Signals

Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...

متن کامل

Improved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition

Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...

متن کامل

Effects of ageing on speed and temporal resolution of speech stimuli in older adults

 Background: According to previous studies, most of the speech recognition disorders in older adults are the results of deficits in audibility and auditory temporal resolution. In this paper, the effect of ageing on timecompressed speech and auditory temporal resolution by word recognition in continuous and interrupted noise was studied. Methods: A time-compressed speech test (TCST) w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1987